持续学习系统将知识从先前看到的任务转移以最大程度地提高新任务的能力是该领域的重大挑战,从而限制了持续学习解决方案对现实情况的适用性。因此,本研究旨在扩大我们在不断加强学习的特定情况下对转移及其驱动力的理解。我们采用SAC作为基础RL算法和持续的世界作为连续控制任务的套件。我们系统地研究SAC(演员和评论家,勘探和数据)的不同组成部分如何影响转移功效,并提供有关各种建模选项的建议。在最近的连续世界基准中评估了最佳的选择,即称为clonex-sac。 Clonex-SAC获得了87%的最终成功率,而Packnet的80%是基准中的最佳方法。此外,根据连续世界提供的指标,转移从0.18增至0.54。
translated by 谷歌翻译
我们引入了一个新的培训范式,该范围对神经网络参数空间进行间隔约束以控制遗忘。当代持续学习(CL)方法从一系列数据流有效地培训神经网络,同时减少灾难性遗忘的负面影响,但它们不能提供任何确保的确保网络性能不会随着时间的流逝而无法控制地恶化。在这项工作中,我们展示了如何通过将模型的持续学习作为其参数空间的持续收缩来遗忘。为此,我们提出了Hypertrectangle训练,这是一种新的训练方法,其中每个任务都由参数空间中的超矩形表示,完全包含在先前任务的超矩形中。这种配方将NP-HARD CL问题降低到多项式时间,同时提供了完全防止遗忘的弹性。我们通过开发Intercontinet(间隔持续学习)算法来验证我们的主张,该算法利用间隔算术来有效地将参数区域建模为高矩形。通过实验结果,我们表明我们的方法在不连续的学习设置中表现良好,而无需存储以前的任务中的数据。
translated by 谷歌翻译
现代生成型号在包括图像或文本生成和化学分子建模的各种任务中获得优异的品质。然而,现有方法往往缺乏通过所要求的属性产生实例的基本能力,例如照片中的人的年龄或产生的分子的重量。包含此类额外的调节因子将需要重建整个架构并从头开始优化参数。此外,难以解除选定的属性,以便仅在将其他属性中执行不变的同时执行编辑。为了克服这些限制,我们提出插件(插件生成网络),这是一种简单而有效的生成技术,可以用作预先训练的生成模型的插件。我们的方法背后的想法是使用基于流的模块将纠缠潜在的潜在表示转换为多维空间,其中每个属性的值被建模为独立的一维分布。因此,插件可以生成具有所需属性的新样本,以及操作现有示例的标记属性。由于潜在代表的解散,我们甚至能够在数据集中的稀有或看不见的属性组合生成样本,例如具有灰色头发的年轻人,有妆容的男性或胡须的女性。我们将插入与GaN和VAE模型组合并将其应用于图像和化学分子建模的条件生成和操纵。实验表明,插件保留了骨干型号的质量,同时添加控制标记属性值的能力。
translated by 谷歌翻译
减少大深度学习模型的处理时间的问题是许多现实世界应用中的根本挑战。早期退出方法通过将附加内部分类器(IC)附加到神经网络的中间层来努力实现这一目标。 IC可以快速返回简单示例的预测,结果,降低整个模型的平均推理时间。但是,如果特定IC不决定早期回答,则其预测被丢弃,其计算有效地浪费。为了解决这个问题,我们引入零时间浪费(ZTW),这是一种新的方法,其中每个IC重用由其前辈返回的预测(1)在IC和(2)之间以相对于类似的方式组合先前输出之间的直接连接。我们对各个数据集和架构进行了广泛的实验,以证明ZTW实现了比最近提出的早期退出方法的其他更好的比例与推理时间权衡。
translated by 谷歌翻译
This short report reviews the current state of the research and methodology on theoretical and practical aspects of Artificial Neural Networks (ANN). It was prepared to gather state-of-the-art knowledge needed to construct complex, hypercomplex and fuzzy neural networks. The report reflects the individual interests of the authors and, by now means, cannot be treated as a comprehensive review of the ANN discipline. Considering the fast development of this field, it is currently impossible to do a detailed review of a considerable number of pages. The report is an outcome of the Project 'The Strategic Research Partnership for the mathematical aspects of complex, hypercomplex and fuzzy neural networks' meeting at the University of Warmia and Mazury in Olsztyn, Poland, organized in September 2022.
translated by 谷歌翻译
Transfer learning is a popular technique for improving the performance of neural networks. However, existing methods are limited to transferring parameters between networks with same architectures. We present a method for transferring parameters between neural networks with different architectures. Our method, called DPIAT, uses dynamic programming to match blocks and layers between architectures and transfer parameters efficiently. Compared to existing parameter prediction and random initialization methods, it significantly improves training efficiency and validation accuracy. In experiments on ImageNet, our method improved validation accuracy by an average of 1.6 times after 50 epochs of training. DPIAT allows both researchers and neural architecture search systems to modify trained networks and reuse knowledge, avoiding the need for retraining from scratch. We also introduce a network architecture similarity measure, enabling users to choose the best source network without any training.
translated by 谷歌翻译
Petrov-Galerkin formulations with optimal test functions allow for the stabilization of finite element simulations. In particular, given a discrete trial space, the optimal test space induces a numerical scheme delivering the best approximation in terms of a problem-dependent energy norm. This ideal approach has two shortcomings: first, we need to explicitly know the set of optimal test functions; and second, the optimal test functions may have large supports inducing expensive dense linear systems. Nevertheless, parametric families of PDEs are an example where it is worth investing some (offline) computational effort to obtain stabilized linear systems that can be solved efficiently, for a given set of parameters, in an online stage. Therefore, as a remedy for the first shortcoming, we explicitly compute (offline) a function mapping any PDE-parameter, to the matrix of coefficients of optimal test functions (in a basis expansion) associated with that PDE-parameter. Next, as a remedy for the second shortcoming, we use the low-rank approximation to hierarchically compress the (non-square) matrix of coefficients of optimal test functions. In order to accelerate this process, we train a neural network to learn a critical bottleneck of the compression algorithm (for a given set of PDE-parameters). When solving online the resulting (compressed) Petrov-Galerkin formulation, we employ a GMRES iterative solver with inexpensive matrix-vector multiplications thanks to the low-rank features of the compressed matrix. We perform experiments showing that the full online procedure as fast as the original (unstable) Galerkin approach. In other words, we get the stabilization with hierarchical matrices and neural networks practically for free. We illustrate our findings by means of 2D Eriksson-Johnson and Hemholtz model problems.
translated by 谷歌翻译
Graph Neural Networks (GNNs) are a family of graph networks inspired by mechanisms existing between nodes on a graph. In recent years there has been an increased interest in GNN and their derivatives, i.e., Graph Attention Networks (GAT), Graph Convolutional Networks (GCN), and Graph Recurrent Networks (GRN). An increase in their usability in computer vision is also observed. The number of GNN applications in this field continues to expand; it includes video analysis and understanding, action and behavior recognition, computational photography, image and video synthesis from zero or few shots, and many more. This contribution aims to collect papers published about GNN-based approaches towards computer vision. They are described and summarized from three perspectives. Firstly, we investigate the architectures of Graph Neural Networks and their derivatives used in this area to provide accurate and explainable recommendations for the ensuing investigations. As for the other aspect, we also present datasets used in these works. Finally, using graph analysis, we also examine relations between GNN-based studies in computer vision and potential sources of inspiration identified outside of this field.
translated by 谷歌翻译
Recently proposed systems for open-domain question answering (OpenQA) require large amounts of training data to achieve state-of-the-art performance. However, data annotation is known to be time-consuming and therefore expensive to acquire. As a result, the appropriate datasets are available only for a handful of languages (mainly English and Chinese). In this work, we introduce and publicly release PolQA, the first Polish dataset for OpenQA. It consists of 7,000 questions, 87,525 manually labeled evidence passages, and a corpus of over 7,097,322 candidate passages. Each question is classified according to its formulation, type, as well as entity type of the answer. This resource allows us to evaluate the impact of different annotation choices on the performance of the QA system and propose an efficient annotation strategy that increases the passage retrieval performance by 10.55 p.p. while reducing the annotation cost by 82%.
translated by 谷歌翻译
A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.
translated by 谷歌翻译